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ABSTRACT Urinary tract infections (UTIs) are one of the most commonly acquired bacterial infections in humans, and uropatho- 
genic Escherichia coli strains are responsible for over 80% of all cases. The standard method for identification of uropathogens 
in clinical laboratories is cultivation, primarily using solid growth media under aerobic conditions, coupled with morphological 
and biochemical tests of typically a single isolate colony. However, these methods detect only culturable microorganisms, and 
characterization is phenotypic in nature. Here, we explored the genotypic identity of communities in acute uncomplicated UTIs 
from 50 individuals by using culture-independent amplicon pyrosequencing and whole-genome and metagenomic shotgun se- 
quencing. Genus-level characterization of the UTI communities was achieved using the 16S rRNA gene (V8 region). Overall UTI 
community richness was very low in comparison to other human microbiomes. We strain-typed Escherichia-dominated UTIs 
using amplicon pyrosequencing of the fimbrial adhesin gene, fimH. There were nine highly abundant fimH types, and each UTI 
sample was dominated by a single type. Molecular analysis of the corresponding clinical isolates revealed that in the majority of 
cases the isolate was representative of the dominant taxon in the community at both the genus and the strain level. Shotgun se- 
quencing was performed on a subset of eight E. coli urine UTI and isolate pairs. The majority of UTI microbial metagenomic 
sequences mapped to isolate genomes, confirming the results obtained using phylogenetic markers. We conclude that for the 
majority of acute uncomplicated E. co/i-mediated UTIs, single cultured isolates are diagnostic of the infection. 

IMPORTANCE In clinical practice, the diagnosis and treatment of acute uncomplicated urinary tract infection (UTI) are based on 
analysis of a single bacterial isolate cultured from urine, and it is assumed that this isolate represents the dominant UTI patho- 
gen. However, these methods detect only culturable bacteria, and the existence of multiple pathogens as well as strain diversity 
within a single infection is not examined. Here, we explored bacteria present in acute uncomplicated UTIs using culture- 
independent sequence-based methods. Escherichia coli was the most common organism identified, and analysis of E. coli domi- 
nant UTI samples and their paired clinical isolates revealed that in the majority of infections the cultured isolate was representa- 
tive of the dominant taxon at both the genus and the strain level. Our data demonstrate that in most cases single cultured isolates 
are diagnostic of UTI and are consistent with the notion of bottlenecks that limit strain diversity during UTI pathogenesis. 
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Urinary tract infections (UTIs) are one of the most common 
bacterial infectious diseases of humans, responsible for an 
estimated 9.6 million doctor visits in the United States each 
year (1). Most are acute uncomplicated infections that occur in 
healthy individuals with no history of urological disorders. 
Uropathogens may colonize the urethral opening and infect 
the lower urinary tract up to the bladder, resulting in urethritis 
and cystitis, respectively (2). Some UTIs may progress further, 
resulting in infection of the kidneys (pyelonephritis), and even 
spread into the bloodstream, leading to a systemic infection 
known as urosepsis (3). 

Over 80% of UTIs have been attributed to Escherichia coli in- 



fections, while other, less common uropathogens include a variety 
of Gram-positive and Gram-negative bacteria such as Staphylo- 
coccus, Klebsiella, Serratia, Enterococcus, and Proteus species (4, 5). 
The standard method used for the identification of uropathogens 
in most clinical laboratories is microscopy followed by conven- 
tional microbiological culturing (6, 7). Culture-independent 
analysis using sequencing of the 16S rRNA gene has indicated that 
UTIs may be more polymicrobial than initially believed and has 
implicated organisms, such as Actinobaculum schaalii and Aero- 
coccus urinae, which are fastidious and may be overlooked by stan- 
dard techniques (8-10). Furthermore, 16S studies of culture- 
negative UTI samples and healthy urine samples have 
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demonstrated that bacterial colonization can occur despite the 
inability to cultivate organisms (8, 11-15). 

For culture-positive UTIs, molecular analysis based on se- 
quencing has shown that cultivation generally is able to identify 
the most dominant organism in infected urine at the species level 
(8, 10, 16). Numerous studies have compared cultured isolates 
across patient cross sections, but little is known about the strain- 
level diversity of uropathogens within individuals. The fimH gene 
has been used extensively as a phylogenetic marker for the char- 
acterization of uropathogenic E. coli (UPEC) isolates at the strain 
level (17-21). The fimH gene encodes the tip-located type 1 fim- 
brial FimH adhesin that mediates binding to a-D-mannosylated 
glycoproteins such as uroplakins on human bladder epithelial 
cells (ECs) (22-24) and is an essential virulence factor required for 
adhesion, invasion, intracellular bacterial community formation, 
and colonization of the bladder (25, 26). The sequence of fimH is 
highly conserved in virtually all E. coli strains sequenced to date 
(including nonuropathogens), with minor sequence variations 
often corresponding to functional differences, most likely the re- 
sult of adaptation to enhance pathogenesis (19, 27-34). Com- 
pared to other established E. coli genotyping methods such as mul- 
tilocus sequence typing (MLST) (20), typing based on fimH 
variations is a faster and more practical approach (18). A two- 
locus-typing scheme combining fimH with fumC (fumarase C) 
has also been shown to provide strong clonal discrimination 
power for molecular epidemiological analyses (21). 

Here, we profiled the microbial communities associated with 
50 uncomplicated UTIs using 16S rRNA and fimH amplicon py- 
rosequencing. These data were compared to the corresponding 
markers from the paired clinical strains obtained in the hospital 
pathology lab to determine how representative single clinical iso- 
lates are of underlying infections at both the genus (16S rRNA) 
and the strain (fimH) level. Eight paired urine UTI samples and 
E. coli isolates spanning the observed marker gene diversity were 
then shotgun sequenced to determine if fimH is sufficiently rep- 
resentative of strain-level diversity at the whole-genome level. 

RESULTS 

Patient demographics and clinical microbiology. Urine samples 
were obtained from 50 individuals with acute uncomplicated 
UTIs (see Table SI in the supplemental material). The majority of 
study subjects were female (76%), and patient ages ranged from 
<1 year to 94 years (Table SI; median, 56 years). E. coli was the 
most commonly isolated organism (70%) followed by Pseudomo- 
nas aeruginosa (10%; Table SI). E. coli infection was associated 
with age (P < 0.0001, Mann- Whitney U test) and gender (P = 
0.03, Fisher's exact test), occurring most often in younger females. 
Age was also associated with E. coli type, with phylogroup B2 more 
common in younger individuals versus D in older individuals (P 
= 0.01, Mann-Whitney U test). Males had higher white blood cell 
(WBC) counts on average (P = 0.04, Mann-Whitney U test) than 
did females, but no other correlations between clinical parameters 
and gender were observed. 

Cultured isolates are representative of UTI microbial com- 
munities at the genus level. For the majority of patients, the most 
abundant member of the microbial community as determined by 
16S pyrosequencing was concordant with the cultured isolate 
(Fig. 1). Over 80,000 amplicon sequences were obtained for the 50 
patients, with a median of 1,359 sequences per sample (normal- 
ized to 1,000 for analysis). 16S sequences were analyzed using 



QIIME and CD-HIT-OTU, and microbial community profiles 
were compared to cultured isolates. The taxonomic identity of 
microbial isolates was verified by 16S Sanger sequencing (see Ta- 
ble SI in the supplemental material). In all cases, the genus corre- 
sponding to the isolate(s) was observed in the profiles; however, in 
eight samples (16%) the cultured genus was not the most domi- 
nant. Analysis of biological replicates from these samples indi- 
cated that this was not likely due to heterogeneity within urine 
samples (see Fig. SI). In three of the 50 cases (6%), the most 
abundant genus in the community profile was anaerobic (either 
Anaerococcus or Peptoniphilus), while only aerobic organisms 
were cultured. No epithelial cells were identified microscopically 
in these samples; however, one of the Anaerococafs-dominated 
urine samples was noted as containing debris, which could be 
indicative of contamination during sample collection and a source 
of additional microbial diversity. Both Anaerococcus and Pep- 
toniphilus species have also been identified in the urinary micro- 
biome of healthy individuals (35). For a further three individuals, 
Gram-negative Enterobacteriaceae were cultured, while Gram- 
positive organisms were found to be most dominant by 16S se- 
quencing (50 to 75%). In the two remaining individuals, different 
genera of Enterobacteriaceae were represented by the cultured iso- 
late and the most abundant constituent in the community pro- 
files. 

UTI microbial communities were significantly different be- 
tween age groups (P = 0.013, generalized PERMANOVA) (see 
Fig. S2A in the supplemental material). Principal component 
analysis using the generalized Unifrac distance indicated that 
communities dominated by Enterobacteriaceae, and more specif- 
ically E. coli, were more common in younger individuals. Older 
individuals tended to have communities with higher abundances 
of Pseudomonas and members of the phylum Firmicutes. Gender 
was a borderline significant factor influencing community com- 
position (P = 0.054) (see Fig. S2B). UTI samples from females 
were hallmarked by microbial communities with a high relative 
abundance of Enterobacteriaceae and/or the presence of lactic acid 
bacteria and other bacilli. In particular, the genera Streptococcus, 
Lactobacillus, and Staphylococcus were not detected in any males in 
the study, even at low relative abundance (Fig. 1 ) . These results are 
highly concordant with those based on isolate data alone and re- 
flect the high correspondence between cultured isolates and mi- 
crobial communities as described above. 

Overall diversity in UTI microbial communities was very low 
(see Fig. S3 in the supplemental material). Rarefaction analysis 
was used to compare operational taxonomic unit (OTU) richness 
between samples, and overall diversity was assessed using the 
Shannon index at a depth of 1,000 sequences (see Table SI). The 
average UTI diversity by the latter metric is 0.51, which is lower 
than those of human skin (1 to 2.5) (36) and the gut (~6) (37). The 
Shannon index was inversely correlated with the relative abun- 
dance of E. coli (Tau = -0.54, P < 0.0001), and these UTIs had 
lower overall richness in general. E. coli infections also had a bor- 
derline significant association with higher microbial loads as de- 
termined by quantitative PCR (qPCR) {P = 0.061, Mann- 
Whitney U test). No significant differences in community 
diversity or microbial biomass associated with age or between 
males and females were observed. The most diverse UTI commu- 
nity, with a Shannon index of 3.84, was observed in an 86-year-old 
male (Fig. 1; indicated by asterisk). Clinical notes on this individ- 
ual's sample indicated that a large amount of debris was observed 
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FIG 1 Microbial relative abundance and biomass in UTI samples. The heat map of genus (right)- and phylum (left)-level taxonomy for UTI microbial 
community profiles is based on 16S pyrosequencing with each sample normalized to 1,000 sequence reads. Olive green boxes indicate males, and darker gray 
shading indicates older individuals. The genus of the cultured isolate for each UTI is indicated by a black border around the appropriate box. Sample numbers 
are given for those UTIs which also underwent fimH pyrosequencing, and two samples with noted higher taxonomic richness are marked with an asterisk and a 
plus sign. Relative microbial biomass was estimated by 16S quantitative PCR. 



in the urine. Increased diversity was also observed in a UTI com- 
munity from a 7 1 -year-old female, which was dominated by E. coli 
but also contained several members of the phyla Actinobacteria, 
Bacteroidetes, and Firmicutes (Fig. 1; indicated by plus sign). This 
sample had an epithelial cell count greater than 50, which is sug- 
gestive of contamination by normal microbiota of the anogenital 
region. 

Isolates are congruent with the dominant strain in E. coli 
UTIs. Escherichia was the most common genus identified by both 
culture and community profiling in urine UTI samples. We fur- 
ther characterized Escherichia-positive samples by examining the 
urine UTI samples and their respective matching clinical isolates 
(n = 27) using the fimH gene as a strain-level marker. A 523-bp 
region of the fimH gene was targeted for amplicon pyrosequenc- 
ing using primers designed to amplify all fimH genes belonging to 
the genus Escherichia. These primers were used to profile Esche- 
richia strain-level diversity in the UTI samples, and representative 
sequences for all unique OTUs (clusters with 100% identity) were 
used to generate a phylogenetic tree with manually identified sub- 
trees (Fig. 2). Two additional urine UTI samples from which other 



Enterobacteriaceae {Enterobacter aerogenes and Proteus mirabilis) 
were cultured but which had > 1 % relative abundances of E. coli in 
their community profiles were also included in the analysis. fimH 
sequences were generated from E. coli isolates using Sanger se- 
quencing. Isolates were also classified using the Clermont method, 
which classifies E. coli into phylogroups based on a multiplex PCR 
targeting four discriminating loci within E. coli genomes (38). Ad- 
ditionally, the expression of functional type I fimbriae by all clin- 
ical isolates was confirmed using a yeast agglutination assay (39) 
In all cases where E. coli was cultured, the dominant fimH type 
in the community matched the cultivated isolate ( Fig. 2 ) . A total of 
33 distinct nucleotide sequences were observed in the UTI com- 
munities. Ten of these represented dominant fimH phylotypes 
across the 50 UTI samples examined (nucleotide dominant types; 
ND), all of which corresponded to their paired isolate sequences. 
The remaining 22 phylotypes were typically found in low abun- 
dance (nucleotide secondary types; NS) and were not represented 
by the cultured isolates. Three of the ND types were congruent 
with fimH lineages identified in ST131 subclones, including the 
globally distributed fluoroquinolone-resistant fimH30 subtype 
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FIG 2 Heat map of strain-level taxonomy and fimH phylogeny for UTI microbial community profiles based on fimH pyrosequencing. The fimH type of the 
cultured isolate for each UTI is indicated by a black border around the appropriate box. Each node of the phylogenetic tree corresponds to a unique 100% OTU 
representative sequence, and gray shading indicates manually identified subtrees. Amino acid substitutions are indicated by circles for each corresponding 
nucleotide sequence. The phylogroup is given for samples that had E. coli as the cultured isolate, and darker gray squares indicate higher white blood cell counts. 
Sample numbers are presented below the bar for white blood cell count. 



(40) (see Fig. S4 in the supplemental material). /zmH diversity was 
extremely low overall, and in 20 of 29 cases (69%), only one fimH 
type could be detected in UTI samples (Fig. 2; see also Fig. S5). 
Biological replicates were highly concordant (Fig. S5), and rar- 
efaction analysis showed no increase in fimH OTU richness even 
at a depth of 70,000 sequences (Fig. 2; see also Fig. S3C), indicating 
that these results were not due to sampling bias or insufficient 
sequencing. The diversity of fimH was notably higher in UTIs 
where E. coli was not the cultured isolate and for samples with 
epithelial cell counts greater than 50 (see Fig. S2B). 

Translation of fimH nucleotide sequences resulted in 10 
unique amino acid sequence types, five of which were the domi- 
nant type in at least one UTI (Fig. 2; see Table S2 in the supple- 
mental material). Within the amino acid sequence types, five in- 
dividual substitutions were observed (Fig. 2; see Table S2), all of 
which have been previously described (19, 27, 32). None of the 
amino acid types or individual substitutions were significantly 
associated with microbial community diversity (as measured by 
the Shannon index) or total microbial load. FimH allele types were 
labeled according to the presence of amino acid changes in dom- 
inant FimH types (AAD 1 to -5) or low-abundance FimH subtypes 
(AAS6 to -10) (see Table S2). FimH amino acid type AAD2 was 
significantly more abundant in UTIs that had phylogroup D iso- 
lates (P = 0.0002), while AAD3 was associated with phylogroup 



B2 (P = 0.0009). AAD1 had a more widespread distribution and 
was the dominant FimH type in both of the UTIs not dominated 
by E. coli. AAD3 correlated significantly with an increased white 
blood cell (WBC) count (P = 0.005), while AAD1 was associated 
with a lower WBC count in general (P = 0.02). This reflected an 
underlying association between the amino acid substitutions 
N70S and S78N, both of which appear in AAD 3 (see Table S2), 
and a higher WBC count (P = 0.0008 and? = 0.001, respectively). 
S78N was also correlated with patient age, occurring more often in 
younger individuals (P = 0.04). 

Whole-genome analysis confirms fimH typing of UPEC 
strains. Eight E. co/z'-dominated urine UTI samples and their 
paired cultured isolates were selected for shotgun metagenomic 
and genomic sequencing, respectively. Genomic contigs were as- 
sembled de novo from isolate sequence data (see Table S3 in the 
supplemental material), and metagenomic sequences were 
mapped to their paired isolate genome to further evaluate the 
strain-level diversity of E. coli and dominance of the cultured iso- 
late in the UTI (see Table S4). Metagenomic sequences were also 
mapped to the human genome to detect contaminating host 
genomic DNA. All UTI metagenomes contained sequences corre- 
sponding to human genomic DNA with relative abundances rang- 
ing from <1% to >80% (Fig. 3; see also Table S4). The two sam- 
ples (UQU-8 and UQU-57) with extremely low relative 



4 mBio' mbio.asm.org 



March/April 2014 Volume 5 Issue 2 e01064-13 



UTI Isolates Represent Dominant In Situ Populations 




90 98 

■ Human ■ Isolate E. coli 

Non-human Non-isolate E. coli 

■ Other microbial 

FIG 3 Taxonomic classification of metagenomic sequences. Bars show the 
proportion of metagenomic sequences which mapped to the human genome. 
Pie charts show the proportions of non-human sequences which mapped to 
E. coli isolate scaffolds and those which were assigned to non-isolate E. coli 
genomes and other microbial genomes by MEGAN. 



abundances of human sequences (<1%) had white blood cell 
counts of <60, compared to >250 for all others, suggesting that 
WBCs may have been the major source of human genomic DNA. 
Sequences that did not map to isolate genomic contigs or the 
human genome were taxonomically classified using MEGAN as 
described in Materials and Methods. 

For all UTI metagenomes, the majority of microbial sequences 
identified corresponded to E. coli and more specifically mapped to 
the corresponding cultured isolate (Fig. 3). Community profiles 
for UQU-8, UQU-24, UQU-41, UQU-90, and UQU-98 identified 
Escherichia species at >90% abundance (Fig. 1), and correspond- 
ingly, >95% of microbial sequences in metagenomes for these 
UTIs corresponded to E. coli. Furthermore, in these five UTIs and 
all other cases, 80% or more of the E. coli sequences were repre- 



sentative of the isolate genome. In addition to high relative abun- 
dances of E. coli, UQU-26, UQU-50, and UQU-57 had significant 
(>5% of total sequences) populations of other microbes, includ- 
ing Streptococcus spp. (UQU-26 and UQU-57), Staphylococcus 
spp. (UQU-57), and A. urinae and Actinomycetales spp. (UQU- 
50). These results are consistent with 16S profiles for these sam- 
ples. 

DISCUSSION 

Common clinical practice for acute uncomplicated UTIs is to di- 
agnose and treat the infection on the basis of characterization of a 
single cultured bacterial isolate, under the assumption that this 
isolate represents the dominant in situ population (6, 7). Recent 
evidence suggests that UTIs may be more polymicrobial than pre- 
viously suspected and also that detectable bacterial populations 
are present even in healthy urine (11, 14, 15, 41). Here, we used 
culture-independent sequencing-based methods to investigate 
microbial communities in urine samples from individuals with 
uncomplicated UTIs at the genus and strain level to determine if 
single isolates are adequate to describe the most abundant popu- 
lations in the disease state. 

For the majority of UTIs (80%), cultured isolates were repre- 
sentative of dominant organisms at the genus level. Previous 
culture-independent analyses of UTIs in hospitalized patients, 
catheter-associated UTIs, and asymptomatic bacteriuria demon- 
strated similar concordance between cultivation and molecular 
analysis for culture-positive specimens (8, 10, 16, 41). Compara- 
ble levels of agreement have been reported for other high-biomass 
bacterial infections, such as those implicated in cystic fibrosis lung 
disease (42) and chronic wounds (43). 

In three of 50 UTIs, anaerobic genera were more abundant 
than the isolate genus, while in a fourth, fastidious Aerococcus spp. 
were most dominant. The use of denaturing high-performance 
liquid chromatography coupled with 16S rRNA gene sequencing 
in infected urine samples first identified the widespread presence 
of fastidious bacteria, including obligate anaerobes that were pre- 
viously overlooked by routine microbiological cultivation (8, 10). 
While some of these, such as A. urinae (44, 45), have been impli- 
cated directly in the etiopathogenesis of UTIs, the role of others, 
especially anaerobes and Lactobacillus spp., is largely unclear, as 
they have often been isolated from polymicrobial infections in 
conjunction with more traditional pathogens such as E. coli (8, 10, 
46). Furthermore, community profiling using high-throughput 
amplicon pyrosequencing has subsequently demonstrated that 
these organisms are common constituents of both healthy and 
diseased urine microbiomes (11-15,41), suggesting that they may 
not be causally linked to UTI. 

The majority of UTIs were associated with the genus Esche- 
richia and more specifically E. coli, which is consistent with previ- 
ous literature (4, 5). Most of these were largely monomicrobial, 
with other genera present only in low or negligible (<1%) abun- 
dances. Strain typing of E. coli isolates from different UTI samples 
using fitnH gene sequencing indicated a wide diversity of strains in 
the study population. All E. coli strains were also shown to produce 
functional type 1 fimbriae. In order to determine if the isolated 
strains were representative of dominant E. coli populations within 
individual urine UTI samples, we used amplicon pyrosequencing 
of the/imHlocus, as, in general, the phylogenetic resolution of 16S 
amplicon pyrosequencing is limited to the genus level (47, 48). 
While fimH has been used in a number of studies to type UPEC 
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isolates (18, 21, 27, 30, 32, 40), to our knowledge this is the first 
instance where this marker has been used to profile UTI E. coli 
strain diversity in situ. Similar assays have also been developed for 
the rapid analysis of total species and/or strain-level community 
diversity of methanotrophs (49) and ammonia-oxidizing archaea 
(50) in environmental samples. 

In all urine UTI samples, cultured isolates represented the 
dominant and, in most cases, the only detectable E. coli strain. 
Three of the dominant fimH types corresponded to subclones of 
the globally disseminated multidrug-resistant ST131 clone of 
E. coli (40). Two of these, fimH30 and fimHAl, represent common 
and ubiquitous fimH lineages, present in E. coli from a range of 
hosts and geographical locations (40). For a few samples, addi- 
tional strains were detected but at extremely low abundances. This 
lies in stark contrast to other infections such as Pseudomo- 
nas aeruginosa respiratory infections, which have been shown to 
be highly heterogeneous and poorly described by single cultured 
isolates (51-53). P. aeruginosa respiratory disease is most often 
chronic, occurring in patients with anatomical or physiological 
abnormalities that render them susceptible to repeated bouts of 
infection, while acute uncomplicated UTI is more likely to be due 
to an episode of bacterial invasion of a normally closed system. 
Other types of UTIs, such as catheter-associated infections, may 
be more heterogeneous, as they allow for sustained exposure to 
colonizing microbial populations from external sources (16). 

E. coli has an extensive pangenome with a large reservoir of 
genes undergoing frequent lateral transfer (54-56), and thus, in- 
dividual marker genes may not be indicative of strain-level diver- 
sity. Multilocus sequence typing (MLST) has commonly been 
used for classification (57); however, it has been shown to lack 
discriminatory power, grouping genetically, phenotypically, and 
ecologically distinct strains under the same sequence type (21, 40). 
Molecular typing with fimH has been proposed as a feasible alter- 
native to MLST for large-scale epidemiological studies of E. coli 
(18, 21, 40). It has been shown that fimH can be used as a single 
marker or in conjunction with fumC to successfully discriminate 
ecologically and clinically relevant sublineages within MLST 
strain types (21, 40). Comparative analysis of the genomes of two 
ST131 isolates confirmed the validity of this approach, revealing 
marked divergence in a subset of shared core genes, including 
fimH, despite identical MLST loci in the two strains (58). Differ- 
ences in fimH loci were driven by homologous recombination as 
well as the acquisition of point mutations, which further distin- 
guished closely related strains with the same ancestral recombi- 
nant fimH alleles (58). Here, we performed shotgun sequencing 
on eight isolate-urine UTI pairs to corroborate the results of fimH 
amplicon pyrosequencing analysis and determine the validity of 
fimH as a strain-level marker. Overwhelmingly, the majority of 
metagenomic sequences of microbial origin from urine UTI sam- 
ples mapped back to their corresponding isolate genomes, con- 
firming that fimH is indicative of the overall genomic content of 
UPEC strains. 

The ability of fimH to uniquely define UPEC strains may be 
linked to infection bottlenecks that have been defined in the 
mouse UTI model (59). In mice, FimH-mediated binding to man- 
nosylated glycoproteins on bladder epithelial cells is critical for 
UPEC colonization and invasion of the uroepithelium and the 
initiation of cystitis (24, 60, 61). Enhanced mannose binding due 
to pathoadaptive mutations under high-shear-force conditions is 
also beneficial as it allows bacterial cells to resist clearance by mic- 



turition (30, 62, 63). Chen et al. demonstrated that fimH also 
exhibits a second class of mutations that are independent from 
those that enhance mannose binding and are important for the 
formation and proliferation of intracellular bacterial communi- 
ties (IBCs) (27). IBCs in mice infected with a panel of differentially 
tagged UPEC strains were clonally derived from a single invasive 
bacterium, despite initial infection with a diverse population (64). 
The formation of IBCs represents a stringent bottleneck that in- 
fluences the bacterial population in the lumen of the mouse blad- 
der (64), and UPEC diversity has been shown to decrease as infec- 
tions progress from the bladder to the kidneys and into the 
bloodstream (65). Thus, the proliferation of a single dominant 
fimH strain in acute uncomplicated human UTI may therefore be 
a consequence of multiple dynamics during infection. Minor sub- 
populations exhibiting other fimH types could represent either 
genetic drift in the founder population or the transient flux of 
newly introduced strains ascending from the intestinal tract or 
descending from the kidneys (65). 

Our results indicate that identification of dominant uropatho- 
gens in UTIs via single culture representatives is diagnostic at the 
genus level in the majority of cases, and also at the strain level for 
E. coli infections. This implies that the common practice of pre- 
scribing antibiotic treatment based on resistance profiles of iso- 
lates is an effective treatment strategy for acute uncomplicated 
UTIs. Newer, more rapid, and cost-effective methods such as flow 
cytometry (66) and real-time PCR assays (67) that are targeted at 
identifying highly abundant organisms rather than total microbial 
communities will similarly be effective for the diagnosis and mon- 
itoring of UTIs. However, the question remains whether other 
microbial populations present in infected urine samples are caus- 
ally linked to UTI, and further experiments exploring the mecha- 
nisms of microbial pathogenicity in the urinary tract are neces- 
sary. Additionally, while we have demonstrated the utility of fimH 
as a molecular marker for strain typing of E. coli and verified the 
lack of strain-level diversity in E. coZi-dominated UTIs, it remains 
to be determined how representative single clinical isolates are for 
other common UTI pathogens and other types of infections such 
as complicated or catheter-associated UTIs. 

MATERIALS AND METHODS 

Ethical approval. Ethical approval for this study was obtained from the 
Royal Brisbane and Women's Hospital (RBWH) ethics committee 
(HREC/ll/QRBW/107). The need for informed consent was waived by 
the institutional review board. 

Study inclusion criteria. In order to be included in the study, individ- 
uals were required to have no history of urological disorders as well as a 
urine sample clinically diagnosed with UTI based on cell counts and cul- 
tures. Epithelial cell (EC), white blood cell (WBC), and red blood cell 
(RBC) counts were performed using phase-contrast microscopy with 
Kova slides (Hycor, CA) at X10 and X40 magnifications. Urine cultures 
were performed on both MacConkey and blood agar plates, with incuba- 
tion at 35°C for 16 to 1 8 h under aerobic conditions. A clinical diagnosis of 
UTI required a WBC count of >10 7 liter" 1 , an EC count of <10 7 liter" 1 , 
and bacterial cultures in excess of 10 6 CFU ■ liter" 1 . Urine samples with a 
WBC count of >10 7 liter" 1 and an EC count of >10 7 liter" 1 were also 
considered positive if culture counts were >10 7 CFU • liter" 1 . 

Sample collection and initial processing. Midstream urine samples 
were collected from patients presenting at the RBWH for microbiological 
analysis. All urine samples were deidentified and frozen at — 20°C prior to 
DNA isolation for pyrosequencing. DNA was extracted from 1 ml of each 
urine sample using the Nucleospin tissue kit (Macherey-Nagel, Diiren, 
Germany) according to the manufacturer's protocol for hard-to-lyse or- 
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ganisms. Extractions were performed in duplicate on a subset of samples 
for biological replication. DNA was stored at — 20°C prior to further pro- 
cessing. Cultured bacterial isolates were also obtained for each urine sam- 
ple. Patient demographic data (gender and age) were obtained and 
matched to all deidentified samples. 

Phenotypic and molecular analysis of clinical isolates. Bacterial iso- 
lates were identified to the species level using phenotypic assays, including 
the Vitek2 microbial identification system (bioMerieux). A single colony 
representing the dominant colony type was selected from the original 
culture plate and used to inoculate 5 ml of sterile Luria-Bertani (LB) liquid 
medium. Cultures were incubated overnight at 37°C with shaking 
(150 rpm). Cells were centrifuged at 4,000 X g for 15 min, resuspended in 
20% glycerol liquid LB medium, and stored in 1-ml aliquots at —80° C. 

Isolates were identified at the molecular level by Sanger sequencing of 
the 16S rRNA gene using the primers 926F and 1392wR (see Table S5 in 
the supplemental material). A 50-jU.l PCR was run for each sample. Direct 
lysis multiplex PCR was carried out using a 25-ftl reaction volume con- 
taining 2.5 fA 10X buffer, 10 mM (each) deoxynucleoside triphosphate 
(dNTP) mix, 2 mM MgCl 2 , 15 jug bovine serum albumin (BSA), 
12.5 pmol of each primer, 1 unit Taq polymerase (Fisher Biotec, Austra- 
lia), and 0.5 fil of the culture glycerol stock as the DNA template. PCR was 
performed using an Applied Biosystems Veriti thermal cycler under the 
following conditions: cell lysis at 96°C for 10 min; initial denaturation at 
95°C for 3 min; 30 cycles of 95°C (30 s), 55°C (45 s), and 72°C (1 min 30 s); 
and a final extension at 72°C of 10 min. Briefly, 2 /id of exonuclease and 
Antarctic phosphatase master mix (Affymetrix, Santa Clara, CA) was 
added to 20 fil of PCR product. Samples were then incubated in an Ap- 
plied Biosystems Veriti thermal cycler for 15 min at 37°C and 80°C, re- 
spectively, to completely inactivate the enzymes. Purified PCR products 
were submitted for Sanger sequencing using the 926F primer at Macrogen 
Inc. (Seoul, South Korea). 16S Sanger sequences were manually trimmed 
and compared to the Greengenes database for taxonomic assignment us- 
ing BLAST (68, 69). 

Phylogroups for E. coli isolates were determined by a modified direct 
lysis triplex PCR method using primers targeting two genes (chuA and 
yjaA) and one anonymous DNA fragment (TspE4.C2) (see Table S5 in the 
supplemental material) as previously described (38). To assay for the pres- 
ence of functional type 1 fimbriae, E. coli isolates were grown in overnight 
5-ml static cultures under aerobic conditions at 37°C. Subsequently, 10 p.1 
of each overnight E. coli culture was mixed with 1 0 fil of 5% yeast solution, 
respectively, and agglutination was observed as previously described (39) . 

Primers targeting the fimH gene were designed based on regions of the 
gene conserved across different E. coli strains, including an outgroup spe- 
cies, Escherichia albertii, using ARB (70). Using the fimH primers 72F and 
563R (see Table S5 in the supplemental material), direct lysis PCR was 
carried out in a 25-juJ reaction volume as described for 16S sequencing. 
PCR products were purified using the modified Exo-Sap protocol and 
sequenced from both the forward and reverse primers at Macrogen Inc. 
(Seoul, South Korea). Each pair of fimH Sanger sequences (forward and 
reverse) was aligned and assembled, and consensus calls were performed 
automatically according to base quality, using the Geneious software 
package. Ambiguous base calls were checked manually using chromato- 
grams. Sequences were trimmed to 320 bp for comparison with fimH 
amplicon pyrosequences as described below. 

Quantitative real-time PCR. Quantitative real-time PCR was per- 
formed on extracted urine DNAs using the 16S primers 803Fa and 
1392wR (see Table S5 in the supplemental material) and SYBR-Green 
MasterMix (Life Technologies, Carlsbad, CA, USA) on the ABI 7900 plat- 
form to assess total microbial load in the UTI samples. Real time-PCR 
(qPCR) was performed in triplicate at two dilutions using the following 
cycling conditions: 10 min at 95°C and 40 cycles of 15 s at 95°C followed by 
1 min at 60°C. A melt curve was produced by running a cycle of 2 min at 
95°C and a last cycle of 15 s at 60°C. 16S copy number was determined 
based on a standard curve constructed from a dilution series of E. coli 
strain DH10B genomic DNA. 



16S community profiling. Fusion primers containing 454 adaptor se- 
quences and oligonucleotide bar codes were used to amplify the V5, V6, 
V7, and V8 regions of the 16S rRNA gene. The forward primer was a 
mixture of four variants (803Fa, 803Fb, 803Fc, and 803Fd) in a ratio of 
2:1:1:1 used in conjunction with the reverse primer 1392wR (see Table S5 
in the supplemental material) (71). PCR was performed as described for 
isolate 16S Sanger sequencing using 50-/U.1 reaction mixtures, with a 
unique bar-coded reverse primer used for each sample. Amplicons were 
purified with the Agencourt AmPure XP system (Beckman Coulter, Dan- 
vers, MA) according to the manufacturer's instructions. The concentra- 
tion of purified amplicons was quantified on a Qubit fluorometer (Invit- 
rogen Corp., Carlsbad, CA) with the Qubit double-stranded DNA 
(dsDNA) high-sensitivity (HS) assay kit. Subsequently, equimolar 
amounts of bar-coded amplicons were pooled and sequenced on the 454 
Genome Sequencer ( GS ) -FLX Titanium platform at the Australian Centre 
for Ecogenomics. 

Amplicon sequences were checked for chimeras using UCHIME ver- 
sion 4.1 (72) and then quality filtered, trimmed to 300 bp, and assigned to 
their respective samples using QIIME (73). Sequence clustering at 97% 
identity was performed using CD-HIT-OTU-454, which detects and ad- 
justs homopolymer errors in amplicon sequences (74). Representative 
centroid sequences for each cluster were compared to the Greengenes 
database (February 2011 release) for taxonomy assignment using BLAST 
(68, 69). Rarefaction analysis was performed using QIIME (73). Sequence 
libraries were normalized to 1,000 sequences per sample using a repeated 
subsampling procedure as described in reference 75 prior to further anal- 
ysis. Genus-level taxonomy for the normalized OTU table was visualized 
using the heatmap.2 function in the R package gplots (76). The Shannon 
index was calculated using QIIME, and beta-diversity was assessed using 
the generalized Unifrac metric (77). 

fimH community profiling. Partial fimH gene sequences from urine 
samples were amplified using the primers fimH72F and fimH563R (see 
Table S5 in the supplemental material) modified to include 454 adaptor 
sequences and a sample-specific oligonucleotide bar code in the forward 
primer. PCRs and cycling conditions were identical to those used to am- 
plify fimH from clinical isolates. Bar-coded amplicon preparation and 
sequencing were performed as described for 16S amplicons. 

fimH sequences were quality filtered, trimmed to 320 bp (to maximize 
phylogenetic resolution), and separated by sample using QIIME. Cluster- 
ing was performed at 100% similarity using CD-HIT-OUT-454, and sin- 
gleton clusters were discarded. OTU representative sequences were man- 
ually checked for uncorrected homopolymer errors in Geneious using the 
conserved length of fimH. Briefly, reference sequences were aligned with 
the fimH sequence from E. coli K-12 (GenBank accession number 
NC_000913.2), and sequences not conforming to the conserved length 
due to deletions or insertions in homopolymer regions were manually 
corrected. Following correction, representative sequences were reclus- 
tered at 100% similarity using CD-HIT-EST (74). Chimeric sequences 
that did not align with the reference were also removed. Rarefaction 
curves were generated using QIIME (70). Sequence libraries were normal- 
ized to 1,000 sequences per sample prior to the creation of heat maps and 
calculation of the Shannon diversity index using QIIME (73). Full-length 
reference and partial pyrosequence amplicon fimH nucleotide sequences 
were aligned in ARB with ClustalW using default settings, and the align- 
ment was masked using a 50% consensus filter (70). The fimH ARB data- 
base is available upon request. An evolutionary distance tree was con- 
structed in ARB using the Olsen correction and neighbor-joining method. 
The naming system of the phylogenetic tree was adapted from reference 
27, and subtrees were manually identified. 

Nucleotide substitutions were described using fimH from E. coli K-12 
as the reference sequence. OTU representative sequences were compared 
to isolate fimH sequences using BLAST and pairwise alignments in Ge- 
neious. Representative sequences were translated using TranSeq (78) and 
aligned in Geneious to the translated E. coli K-12 sequence and to fimH 
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ST131 clone reference sequences (40) using ARB (70). Amino acid sub- 
stitutions were named according to the convention in reference 2 1 . 

Illumina genomic and metagenomic sequencing. Indexed whole- 
genome sequencing libraries were prepared with the Nextera DNA sample 
preparation kit (Illumina, San Diego, CA). Libraries were pooled such 
that each isolate genome was approximately 1/80 of the pool and the 
metagenomes were approximately 9/80 of the pool. This pool of libraries 
was submitted to the Queensland Centre for Medical Genomics (Univer- 
sity of Queensland, QLD, Australia), and a single lane of 2 X 100-bp 
paired-end data was generated on an Illumina HiSeq2000 sequencer (see 
Table SI in the supplemental material). 

Read pairs were checked for overlap using SeqPrep (https://github 
.com/jstjohn/SeqPrep) and clipped to remove low-quality bases using Ne- 
soni (http://bioinformatics.net.au/software.nesoni.shtml). De novo as- 
semblies were generated from the isolate reads for each sample using 
Velvet version 1.2.10 with the help of VelvetOptimiser to determine op- 
timum assembly parameters (79). Isolate genomes were checked for com- 
pleteness using Phylosift (https://github.com/gjospin/PhyloSift) and 
Metachecka (https://github.com/Ecogenomics/PhylogeneticM) to iden- 
tify sequences corresponding to 1 1 1 single-copy genes. 

For each of the metagenome samples, human contamination was re- 
moved by mapping reads to the human genome (hgl9) using BWA on 
default settings (80). Reads that did not map to the human genome were 
recovered and mapped to their respective de novo isolate contigs using 
BWA (default settings). Reads that remained unmapped against the iso- 
late were compared to the NCBI database of complete microbial genomes 
using BLAST. Microbial taxonomy from best BLAST hits was summa- 
rized using MEGAN (81). 

Statistical analysis. All statistical analyses were conducted in R (82). 
Two-way correlations between clinical, microbiological, and molecular 
covariates were determined using Fisher's exact test (two discrete covari- 
ates), the Mann-Whitney U test (one discrete and one continuous cova- 
riate), or Kendall's Tau (two continuous covariates). The Mann-Whitney 
U test was also used to compare average Unifrac distances between mi- 
crobial community biological replicates. Associations between covariates 
and total microbial communities were determined using generalized 
PERMANOVA (83) with the unweighted, weighted, and generalized Uni- 
frac distance matrices as inputs (77). Principal component analysis based 
on generalized Unifrac distances was performed using the function 
prcomp and visualized with the s.class function in the ADE4 package (84) . 

Nucleotide sequence accession number. All sequence data obtained 
from this study have been deposited in GenBank under BioProject 
PRJNA174753. 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
/lookup/suppl/doi:10.1128/mBio.01064-13/-/DCSupplemental. 
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Figure S2, TIF file, 0.3 MB. 
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